The Best 92 Molecular Model Tools in 2025
Chemberta Zinc Base V1
A Transformer model based on the RoBERTa architecture, specifically designed for masked language modeling tasks with chemical SMILES strings
Molecular Model
C
seyonec
323.83k
48
Molformer XL Both 10pct
Apache-2.0
MoLFormer is a chemical language model pre-trained on 1.1 billion molecular SMILES strings from ZINC and PubChem. This version uses 10% samples from each dataset for training.
Molecular Model
Transformers

M
ibm-research
171.96k
19
Evo 1 8k Base
Apache-2.0
Evo is a biological foundation model capable of long context modeling and design. It uses the StripedHyena architecture and can model sequences at single nucleotide and byte-level resolution.
Molecular Model
Transformers

E
togethercomputer
31.09k
9
Evo 1 131k Base
Apache-2.0
Evo is a biological foundation model capable of long-context modeling and design, employing a striped hyena architecture that models sequences at single-nucleotide byte-level resolution.
Molecular Model
Transformers

E
togethercomputer
22.70k
108
Materials.smi Ted
Apache-2.0
Chemical language foundation model proposed by IBM, supporting various tasks such as molecular representation conversion and quantum property prediction
Molecular Model
Transformers

M
ibm-research
20.65k
27
Tabpfn V2 Clf
TabPFN is a Transformer-based foundational model for tabular data. Through its prior data learning mechanism, it achieves outstanding performance on small-scale tabular datasets without requiring task-specific training.
Molecular Model
T
Prior-Labs
20.09k
33
Tabpfn Mix 1.0 Classifier
Apache-2.0
A foundational model for tabular data, pretrained on synthetic datasets generated by mixing random classifiers
Molecular Model
T
autogluon
19.77k
13
Nucleotide Transformer V2 50m Multi Species
The Nucleotide Transformer is a set of foundational language models pre-trained on whole-genome DNA sequences, integrating genomic data from over 3,200 human genomes and 850 diverse species.
Molecular Model
Transformers

N
InstaDeepAI
18.72k
3
Multitask Text And Chemistry T5 Base Augm
MIT
A multi-domain, multi-task language model designed to address a wide range of tasks in chemistry and natural language fields.
Molecular Model
Transformers English

M
GT4SD
11.01k
8
Rnaernie
RNAErnie is a model for self-supervised pre-training based on non-coding RNA sequences. It uses a multi-stage masked language modeling objective to provide powerful feature representation capabilities for RNA research.
Molecular Model
PyTorch
R
multimolecule
11.00k
1
Plantcaduceus L20
Apache-2.0
PlantCaduceus is a DNA language model pre-trained on 16 angiosperm genomes, utilizing Caduceus and Mamba architectures to learn evolutionary conservation and DNA sequence syntax through masked language modeling objectives.
Molecular Model
Transformers

P
kuleshov-group
8,967
1
Geneformer
Apache-2.0
A Transformer model pre-trained on large-scale single-cell transcriptome corpora for network biology prediction
Molecular Model
Transformers

G
ctheodoris
8,365
227
Nucleotide Transformer 500m 1000g
A 500-million-parameter DNA sequence analysis model pre-trained on 3,202 genetically diverse human genomes
Molecular Model
Transformers

N
InstaDeepAI
8,341
6
Rnabert
RNABERT is a pre-trained model based on non-coding RNA (ncRNA), employing Masked Language Modeling (MLM) and Structural Alignment Learning (SAL) objectives.
Molecular Model Other
R
multimolecule
8,166
4
Caduceus Ph Seqlen 131k D Model 256 N Layer 16
Apache-2.0
Caduceus-Ph is a DNA sequence modeling model based on the MambaDNA architecture, with a hidden dimension of 256 and a 16-layer structure.
Molecular Model
Transformers

C
kuleshov-group
5,455
6
Agro Nucleotide Transformer 1b
AgroNT is a DNA language model trained on edible plant genomes, capable of learning universal representations of nucleotide sequences.
Molecular Model
Transformers

A
InstaDeepAI
4,869
13
Nucleotide Transformer 500m Human Ref
A 500M-parameter Transformer model pre-trained on the human reference genome, integrating DNA sequence information from over 3,200 diverse human genomes and 850 species
Molecular Model
Transformers

N
InstaDeepAI
4,482
12
Bert Base Smiles
Openrail
This is a bidirectional transformer model pre-trained on SMILES (Simplified Molecular Input Line Entry System) strings, primarily for molecular-related tasks.
Molecular Model
Transformers

B
unikei
3,688
7
Materials.selfies Ted
Apache-2.0
A Transformer-based encoder-decoder model specifically designed for molecular representation using SELFIES
Molecular Model
Transformers

M
ibm-research
3,343
7
Plantcaduceus L32
Apache-2.0
PlantCaduceus is a DNA language model pre-trained on the genomes of 16 angiosperm species, utilizing Caduceus and Mamba architectures to learn evolutionary conservation and DNA sequence syntax through masked language modeling objectives.
Molecular Model
Transformers

P
kuleshov-group
3,340
7
Hyenadna Small 32k Seqlen Hf
Bsd-3-clause
HyenaDNA is a long-range genomic foundation model pre-trained at single-nucleotide resolution with a context length of up to 1 million tokens.
Molecular Model
Transformers Other

H
LongSafari
2,885
2
GROVER
GROVER is a pre-trained DNA language model specifically designed to understand and generate contextual representations of human genomic sequences.
Molecular Model
Transformers

G
PoetschLab
2,847
14
Nucleotide Transformer 2.5b Multi Species
A DNA sequence analysis model pre-trained on genomes from 850 species, supporting tasks such as molecular phenotype prediction
Molecular Model
Transformers

N
InstaDeepAI
2,714
38
Caduceus Ps Seqlen 131k D Model 256 N Layer 16
Apache-2.0
Caduceus-PS is a DNA sequence modeling model with reverse-complement equivariance, designed for processing long sequences.
Molecular Model
Transformers

C
kuleshov-group
2,618
14
Geneformer
Apache-2.0
Geneformer is a Transformer model pre-trained on large-scale single-cell transcriptome data, specifically designed for scenarios with scarce network biology data, enabling context-aware predictions.
Molecular Model
Transformers

G
tdc
1,127
4
Hyenadna Large 1m Seqlen Hf
Bsd-3-clause
HyenaDNA is a long-range genomic foundation model with a pre-training context length of up to 1 million tokens and single-nucleotide resolution.
Molecular Model
Transformers Other

H
LongSafari
775
25
Chemgpt 4.7M
ChemGPT is a generative molecular modeling Transformer model based on the GPT-Neo architecture, pretrained on the PubChem10M dataset.
Molecular Model
Transformers

C
ncfrey
652
20
SMILES BERT
BERT model trained on 50,000 SMILES strings for understanding and processing chemical molecular representations
Molecular Model
Transformers

S
JuIm
583
4
Dqn MountainCar V0
This is a DQN agent model trained using stable-baselines3, specifically designed to solve reinforcement learning tasks in the MountainCar-v0 environment.
Molecular Model
D
sb3
578
1
Dna2vec
MIT
DNA sequence embedding model based on Transformer architecture, supporting sequence alignment and genomics applications
Molecular Model
Transformers

D
roychowdhuryresearch
557
1
Segment Nt
SegmentNT is a DNA segmentation model based on Nucleotide Transformer, capable of predicting the positions of multiple genomic elements in a sequence at single nucleotide resolution.
Molecular Model
Transformers

S
InstaDeepAI
546
7
Hubert Ecg Small
A self-supervised pre-trained foundation model for ECG analysis, supporting detection of 164 cardiovascular diseases
Molecular Model
Transformers

H
Edoardo-BS
535
2
Pretrained Smiles Pubchem10m
This model is a cheminformatics model pretrained on 10 million SMILES strings from the PubChem database, primarily used for molecular representation learning and chemical property prediction.
Molecular Model
Transformers

P
pchanda
509
1
Druggpt
Gpl-3.0
DrugGPT is a generative drug design model based on the GPT2 architecture, bringing innovation to drug design through natural language processing technology.
Molecular Model
Transformers

D
liyuesen
495
21
Ppo CartPole V1
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the balancing problem in the CartPole-v1 environment.
Molecular Model
P
sb3
449
0
Molt5 Small
Apache-2.0
MOLT5-small is a molecular and natural language conversion model based on a pre - trained model, which can realize the mutual conversion between molecular structures and natural language descriptions.
Molecular Model
Transformers

M
laituan245
443
2
Chemgpt 1.2B
ChemGPT is a generative molecular modeling tool based on the GPT-Neo model, specializing in molecular generation and research in the field of chemistry.
Molecular Model
Transformers

C
ncfrey
409
14
Gpt2 Zinc 87m
MIT
An autoregressive language model based on GPT2 architecture, specifically designed for generating drug-like molecules or embedding representations from SMILES strings
Molecular Model
Transformers

G
entropy
404
3
Gena Lm Bert Large T2t
GENA-LM is an open-source foundational model family for long DNA sequences, based on a Transformer masked language model trained on human DNA sequences.
Molecular Model
Transformers Other

G
AIRI-Institute
386
7
Polync
Apache-2.0
The PolyNC model achieves rapid and accurate prediction of polymer properties by integrating natural language and chemical language.
Molecular Model
Transformers

P
hkqiu
383
3
Leandojo Lean4 Tacgen Byt5 Small
MIT
LeanDojo is a retrieval-augmented language model-based theorem proving system designed to enhance automated theorem proving by combining language models with retrieval techniques.
Molecular Model
Transformers

L
kaiyuy
369
13
Uni 3DAR
MIT
Uni-3DAR is an autoregressive model that unifies various 3D tasks, focusing on the generation and understanding of microscopic structures such as molecules, proteins, and crystals.
Molecular Model
U
dptech
359
2
- 1
- 2
- 3